Abstract
In this chapter, we present a MetaOmics software suite to combine multiple transcriptomic studies for meta-analysis. MetaOmics contains more than a dozen in-housedeveloped methods and consists of seven subpackages for different data analysis and biological objectives: MetaQC for quality control assessment, MetaDE for differentially expressed gene detection, MetaPath for pathway enrichment analysis, MetaPCA for dimension reduction, MetaClust for clustering analysis, MetaNetwork for network analysis, and MetaPredict for prediction analysis.With the increasing number of experimental data accumulated in the public domain, application of related omics metaanalysis methods provides increased statistical power and validated conclusions to improve disease treatment and mechanism understanding.
Introduction
With the advances in high-throughput experimental technology in the past decades, the production of genomic data has become affordable and large genomic data are prevalent in recent biomedical research. Effective data management and analysis tools are essential to fully decipher the biological information inside the tremendous amount of experimental data. In the past decade, enormous bodies of transcriptomic data have been accumulated from microarray experiments, which resulted in several large public data depositories, such as Gene Expression Omnibus (GEO) and ArrayExpress. Recent development of next generation sequencing (NGS) technology accelerated the data accumulation in databases like Sequence Read Archive (SRA). In general, each individual study often has small or moderate sample size. As a result, the statistical power of candidate marker or pathway detection in each study is often limited, the reproducibility of the conclusions is relatively low, and the generalizability of the inferred information has been frequently criticized. Combining multiple studies has emerged as an appealing practice because of improved statistical power and estimation accuracy, while it may also provide validation about the final conclusion. Many “transcriptomic meta-analysis” methods have been developed and widely applied in the real data analysis. In the literature, however, most of the methods were proposed to identify candidate marker genes differentially expressed between two or multiple conditions. Similar “meta-analysis” ideas can be extended for enriched pathway detection, clustering analysis, dimension reduction, and network and disease classification analysis (see Ramasamy et al. (2008) and Tseng et al. (2012) for more details). In this chapter, we first introduce statistical methods in the “MetaOmics” software suite, including those still under development in our lab.